Unsupervised Learning vs Semi-Supervised Learning

October 04, 2022

Introduction

Artificial Intelligence (AI) is a field that is rapidly advancing, producing several outcomes that are reshaping our world as we know it today. One of the key facets of AI is machine learning, which involves teaching machines to learn from examples and adjust their behavior accordingly. There are different approaches to machine learning, and this article will compare two: unsupervised learning and semi-supervised learning.

Unsupervised Learning

Unsupervised learning refers to a machine learning approach where the machine learns without any guidance from anyone. Basically, unsupervised learning algorithms are given raw data and instructed to find structures or patterns within it. The main aim of unsupervised learning is to identify intrinsic structures and relationships using unlabelled data.

For example, let's say we have a dataset of customer purchases in a supermarket. In unsupervised learning, we could find patterns among these purchases such as which products are commonly purchased together.

Semi-Supervised Learning

Semi-supervised learning, on the other hand, is a relatively newer machine learning method that requires some initial guidance by humans to train the machine. This category of machine learning requires both labelled and unlabelled data to train a model. The labelled data provides insight into the type of data being analysed, while the unlabelled data is used to identify patterns and structures similar to unsupervised learning.

Continuing with our supermarket example, semi-supervised learning would involve initially labelling certain products (e.g., chips, soda, chewing gum, etc.) and then allowing the algorithm to learn from unlabelled data to identify other related products.

Comparison

There are several key differences between unsupervised and semi-supervised learning:

  1. Data Requirements: As already noted, unsupervised learning only requires unlabelled data, while semi-supervised learning needs both labelled and unlabelled data to train a model.
  2. Accuracy: Semi-supervised learning typically produces models that are generally more accurate compared to unsupervised learning due to the use of labelled data.
  3. Computational Cost: Unsupervised learning is cheaper compared to semi-supervised learning since it only requires processing unlabelled data. Semi-supervised learning requires processing both labelled and unlabelled data.
  4. Applicability: Unsupervised learning is suitable for datasets where there is no prior knowledge about patterns or where there is limited labelled data. Semi-supervised learning is suitable where there is already some labelled data.
  5. Scalability: Unsupervised learning is more scalable, as it requires less human intervention, while semi-supervised learning requires more human intervention.

Conclusion

In conclusion, both unsupervised learning and semi-supervised learning have their advantages and disadvantages, and choosing one over the other depends on your specific use case. If you have no prior knowledge of the data and no/limited labelled data, unsupervised learning is a good choice. However, semi-supervised learning is a better choice where there is already some labelled data and requires higher accuracy.

References

  • Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep learning (Vol. 1). MIT press.
  • Chapelle, O., Schölkopf, B., & Zien, A. (2006). Semi-supervised learning (Vol. 2). MIT press.
  • Mladenic, D. (1998, May). Unsupervised feature extraction for text classification using svd. In Workshop on Machine Learning with Text-Related Tasks.

© 2023 Flare Compare